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"THE URBAN TEACHER SELECTION INTERVIEW: 
INTERNAL VALIDITY" 

Maryan Baskin and Steven M. Ross 
Memphis State University 

Since a cumulative score is not computed for the Urban Teacher Selection 
Interview, item ratings do not appear to be operationally related to the overall ranking and 
final categorization of teacher education candidates. The purpose of this study was to 
determine the extent to which, during the final decision making process, the interviewers 
ranked and categorized candidates on the basis of the information collected during the 
interview. Subjects were 33 candidates for an alternative licensure program and had been 
recommended for the program by administrators from four local school districts. Trained 
interviewers worked in pairs to conduct the interviews. Candidates were scored, using the 
Urban Teacher Selection Interview Continua Rating Form, on 14 items in seven areas: 
Persistence, Response to Authority, Application of Generalizations, Approach to At- Risk 
Students, Personal vs. Professional Orientation, Burnout, and Fallibility. 

Multiple regression was used to determine which items best predicted the final 
rankings. Only one item. Application of Generalizations- A, surfaced as a significant 
predictor of final ranking (p< .019). Pearson product-moment correlations were computed to 
determine the degree to which each dependent variable correlated with final ranking. Eight 
of 14 items on the interview correlated significantly with the final ranking. However, of the 
six items which did not correlate to final ranking, five were highly correlated with other 
items within the interview. 

Based on these findings, the Urban Teacher Selection Interview does seem to 
have a reasonable degree of internal validity. 
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Much of the criticism leveled at education over the last decade has centered on teachers 
and teacher education. Parents, businesspeople, legislators, and even school administrators have 
voiced concerns that teachers are not prepared for today's classrooms. Many in the business of 
preparing teachers for the profession, however, feel that if we are to overcome these criticisms 
we must choose more prudently potential teachers (Smith & Coleman, 1991). To these 
educators, careful selection of candidates for teacher education programs may be more important 
than the actual training. Martin Haberman put it this way, "It is easer and v/iser to select people 
with attributes that will enable them to succeed in metropolitan schools than it is to expect that 
individuals who might be sexist, racist, uncreative, uninterested in the world of ideas, rigid, 
moralistic, humorless, or fearful will be transformed by virtue of completing a traditional teacher 
education program" (1991, p.l). Mickler and Soloman (1986, p. 340) agree: "Overall the 
research suggests that teachers' technical skills and knowledge of content are relatively ineffective 
in facilitating tot^l student growth in the absence of supportive and positive relationships between 
the teacher and the student." They conclude that teacher selection procedurts which do not use 
attitudes, behaviors, and life style as pre-employment measures are unfair to students and unwise 
for school officials. 

Applegate (1987, p. 2) notes that many of the cries for reform in education focus on 
teacher candidates, but she also writes "...historically only minimal attention has been given to 
the selection of those most able to teach." Kasambria (1984) found both grades and 
recommendations have been inflated to the point that they are virtually useless as tools for 
helping to make appropriate decisions about teacher candidates. Choices must be based on other 
factors. 

Dissatisfaction with traditional selection variables has shifted teacher educators to the 



qualities that appear to distinguish effective teachers aud to attempts to discover the degree to 
which those qualities are possessed by candidates. Jane Stallings (1992), writing about teachers 
in inner-city schools, lists nine personal characteristics, three basic knowledge skills, and six 
lengthy pedagogical knowledge and skills objectives considered to be attributes of effective 
teachers. These range from "sense of personal efficacy" to "observational and interpretive skills 
needed to reflect on instruction." 

Several interviews have been devised by personnel departments, universities, and 
consultants to assist school districts in selecting teachers who have the greatest potential for 
success in Lhe classroom- people who have tiie attributes listed by Stallings (1992) and by oUiers. 
Of the structured interviews available to school district personnel, the Minnesota Teacher Attitude 
Inventory is perhaps the most notable. Others include the Omaha Teacher Interview, Teac'ier 
Perceiver Interview (developed by Selection Research Incorporated of LincoJn, Nebraska), and 
many interview instruments developed specifically for particular districts. Formal, structured 
interviews are generally believed to provide the best assessment because they can be more readily 
validated through research (Baker & Morris, 1990). 

All of tiiese measures attempt to define certain qualities which teachers should possess and 
then to construct questions to determine if the applicant has those qualities. The reaction to the 
overall effectiveness of Uiese measures is mixed (Miller, 1977; Wong, 1989). Educators 
continue to be thwarted in their efforts to discern such skills in applicants to both teaching 
preparation programs and to teaching assignments. All of die characteristics listed by Stallings 
(1992), for example, have definite "face" appeal-tiiey look right. According to the American 
Psychological Association Standards (1974, p, vii), tiiough, "Tace' validity, die mere appearance 
of validity, is not an acceptable basis for interpretive inferences from test scores/ How, then. 
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does one determine whether an applicant possesses the required qualities and whether the 
administrator is asking questions that will discriminate between successful and unsuccessful 
teachers? Stone (1978) says, "The success of virtually all personnel selection techniques (e.e., 
testing, interviewing, etc.) rests upon their criterion-related validity." The question of validity 
remains unanswered for most interview processes currently used to predict the success of 
teachers. Tuckman (1972) notes that questionnaire itenis are usually reviewed for clarity and 
distribution of responses without necessarily running an item analysis. In order to know whether 
or not such interviews are able to identity characteristics of successfiil teachers, both internal and 
external validity must be addressed. 

The relationship between the characteristics measured in the interview and actual selection 
made as a result of tlie interview can help determine internal validity of the decisions. Tuckman 
(1972) feels the researcher must constantly ask about your items: Is this what I want to be 
measuring? (p. 192). He concludes the larger the correlation between an item score and the total 
score, the greater the relationship between what the item is measuring and what the total scale 
is measuring, (p. 199) When validation studies were done on the Teacher Perceiver Interview 
(Sailor, 1984) and the Minnesota Teacher Attitude Inventory (Wong, 1989), for example, 
statistical analysis showed that total scores on shorter versions correlateu highly with total scores 
on the original forms. In each case some items on the original interview contributed more 
strongly to total score than did others. Clearly, appropriate statistical validation of teacher 
selection instruments can save time in the selection procedure and» more importantly, assure 
educators that they are measuring the criteria they have identified. 
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The Urban Teacher Selection Interview 

The Urban Teacher Selection Interview has been developed by Martin Haberman over 
a period of 32 years and reflects four decades of change in urban schools in the United 
States. As early as 1958 Haberman began reviewing and researching personality tests as 
predictors of effective teachers, Th^ work of Robert K. Merton in the 1960's presented a 
sociological analysis of professions which Haberman has applied to the task of predicting 
teacher success. Merton identified two extremes-on the left were personality traits which 
individuals could be expected to demonstrate regardless of the situation, and on the right were 
the behaviors which would be effective for all teachers in a *^ ecific situation. Merton 
advocated that each profession develop "mid-range functions" somewhere between these two 
extremes-that is, groups of behaviors that particular practitioners must demonstrate in order 
to be effective. Originally, Haberman identified eight mid-range functions for teachers. 
Over a period of years these mid-range functions were reevaluated and refined into the seven 
mid-range functions which cuirently appear on the interview. When tl.j interview was used 
in the Milwaukee Public Schools, an error rate of approximately I percent was .eported 
between interview prediction and actual performance of teachers (Haberman, 1991). 

The "Mid- Range Functions" identified by Haberman are: Persistence, Response to 
Authority, Application of Generalizations, Approach to At-Risk Students, Personal vs. 
Professional Orientation Toward Teaching, Burnout, and Fallibility. P ersistence is identified 
in interviews by two questions that look for tenacity, commitment, and a perception of the 
teacher's daily job. It attempts to identify people who will continually seek solutions to the 
never-ending problems of a classroom. Respondents are asked to imagine a problem they 
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might encounter as a beginning teacher, to suggest several ways to deal with that particular 
problem, and to estimate how often they might have to think about a problem like this. 

The second Mid-Range Function, Response to Authority , seeks to determine the 
respondent's willingness to support student learning "in the face of or even against school 
policy." Candidates are asked to identity an activity they would undertake in spite of the fact 
that their administrators might not support the activity. Scoring is based on how they 
respond to an irrational, dogmatic authority who might say, for example, "1 don't care if the 
children are learning, stop this activity in your classroom". 

Application of Generalizations determines the degree to which the respondent is able 
to deal with unive/sal statements about human behavior. When a broad principle has been 
identified by candidates, they are asked to describe how beliefs in this principle might be 
demonstrated in their own classrooms. Can the candidate apply principles to practice? 

A pproach to At-Risk Students seeks to discover if the candidate understands that it is 
her/his professional responsibility as a teacher to constantly find effective curricula and 
methods of instruction regardless of the problems faced by at-risk children. Candidates who 
blame a child's failure on the child, the parents, or the situation, (e.g. the socio-economic 
background) have not responded appropriately. 

The fifth function, Personal versus Professional Orientation to Teaching, intends to 
give the interviewer insight into the candidate's expectations of pupils and their need for 
support from their students. Teachers who enter the profession because they "just love 
children" are seeking to fulfill their own emotion-*! needs and will be disappointed, while 
those with more professional expectations regarding teaching will be less likely to experience 

i 
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this same type of dissatisfaction. 

Burnout is the term used by Haberman to represent the enormous physical and 
emotional drain teachers encounter. Respondents are asked to explain some causes of low 
teacher morale and then to suggest how they might find ways to deal with burnout. 

The last function. Fallibility , looks for the candidates' ability to accept 
himself/herself, and to accept others. Respondents are to think of a mistake they might make 
as a teacher and to propose ways they would deal with these mistakes. 

Haberman interview differs from most other interviews in two ways. First, if a 
candidate receives a 0 rating on any function, he/she is considered to have failed the 
interview. Secondly, candidates do not receive an overall, cumuladve score. Rather, at the 
end of the first three interviews, those conducting the interview are asked to discuss and 
.= '* J upon a ranking of the candidates and an assignment of each candidate to a category 
(Star, High, Average, or Failure). Additional candidates who are interviewed are also 
assigned a category and are fitted into Uie total rank order begun by the first three. 
Haberman further asks interviewers to make separate decisions on every item- including the 
overall rating. 

The Urban Teacher Selection Interview was adopted as a primary selection tool in a 
new alternative licensure program at Memphis State University. Memphis, and the 
surrounding area, offer unique opportunities for training teachers. The city itself faces many 
of the urban and inner-city problems that other metropolitan areas must confront, and often 
new teachers do not feel prepared to deal with the daily crises which occur. The program 
Incorporates strict selection procedures (of the initial K500+ applicants, only 16 were 



8 



selected); intensive course preparation which includes 8 hours a day, 5 days a week during 
the first summer and two courses during each of the following two semesters; and a year of 
teaching in a regular classroom with constant mentoring by both university personnel and an 
experienced teacher in the building where they work. Given the enormous commitment of 
time and energy for the program and tt£ challenges of urban teaching, heavy emphasis was 
placed on selecting students who would ultimately be successful. After deliberation, the 
Urban Teacher Selection Interview was chosen as the final discriminating measure for 
candidates who had met the other criteria imposed by the university ana the state. 

Based on the extensive uses of the Haberman method (Haberman, 1991), the questions 
and the process appear to provide candidates ample opportunity to demonstrate their 
qualification- and potential to teach effectively. Since a cumulative score is not computed, 
item ratings are not operationally related to the overall ranking and categorization » with the 
exception of the standard that a 0 score on any item constitutes failure of the interview. A 
naiural question of interest, and the one examined here, is to what extent is the overall rating 
a function of the individual items? Ideal expectations were that each of the 14 items would 
contribute fairly equally to the candidates final ranking, and each item would make a unique 
(additive) contribution to the final ranking. 

Method 

Subjects were 33 candidates for the DeWitt Wallace-Reader's Digest Scholars at 
Memphis State University, These candidates were selectee^ by administrators from four local 
city or county school districts. At the time of the study candidates were all working as 
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substitute teachers in these districts. Each had an undergraduate degree with at least a 2.5 
grade point average in the last 60 hours, had taken the Miller's Analogies Test with a 
minimum score of 40 and had submitted two letters of recommendation from principals in 
their respective districts. In the principal's recommendation, principals were asked to rank 
each candidate on a scale of "Very successful (1), Successful (2), Average (3), Marginal (4), 
and I Have Serious Concerns (5)". Immediately before the interview, candidates were also 
asked to spend about 30 minutes writing on an assigned topic under the supervision of a 
faculty member. These writing,^ were scored as Good (1), Average (2), Fair (3), and Poor 
(4). Of the 33 candidates, 16 were chosen for the Scholars program based on their 
performance on the Haberman Teacher Selection Interview. 

Interviewers worked in pairs to conduct the interview, but score separately. As 
previously descirbed, at the end of three interviews, the two interviewers discussed the three 
candidates and agreed on a ranking of them. Each additional set of three interviews required 
the interviewers to add the 3 new candidates to the order cf those they had akeady ranked. 
That is, the first three candidates were ranked 1, 2, 3. The next three candidates were 
ranked among those first three such that the best candidate of the six was ranked first, the 
second best second, and so on. 

On the intervies the seven Mid-Range Functions are divided into two questions each 
for a total of 14 areas in which the teacher candidate was rated. Each item was scored on a 
continuum of 0-3. For example, item I. A. could have been scored like this: 

0 X 1 2 3 

The candidate would have received one half a point on that item. Candidates could score 
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anywhere along the continuum (.1. 2.3, 1.75, etc.) Each subpart (question) was considered 
to be equally as important as the others. 



Procedure 

The interviews were conducted in the spring of 1993 by faculty and graduate assistants 
from Memphis State University who had been trained in the Haberman Interview process. 
Interviewers attended a rigorous eight hour training workshop conducted by Dr, Gabriel 
Barrow and Delia Stafford, who have conducted more that 1,400 Urban Teacher Selection 
Interviews themselves, and are the official training team for the Interview. Each Mid-Range 
Function was discussed in detail and interviewers were given tlie opportunity for guided 
practice on each item. In addition, Dr. Haberman visited the camfus and discussed the 
development of the Interview with faculty and school district personnel who would be 
involved. The Urban Teacher Selection Interview Continua Rating Form and ranking 
procedure were used to score candidates. A template was used to divide each interval on the 
rating scale into three equal parts scored as .25, .50, .75... 2.75, 3.00. Each candidate was 
interviewed by two interviewers jointly. At the completion of the first three interviews and 
subsequent interviews, interviewers discussed and devised rankings according to the 
procedures described in the previous section. Consequently, interviewers produced a final 
ranking of all candidates with their best candidate ranked one, second best ranked two, etc. 
Eight pairs of trained interviewers conducted the interviews during one evening. Each pair 
was assigned four or five candidates, thus producing final rankings ranging from 1 to 4 or 1 
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to 5, However, following the Haberman procedure, if a candidate received a "0" on any 
item he/she was automatically given a final rank of 5. 

Resulis 

As discussed earlier, each score mark on the Interview Rating Form was given a 
numerical value (for example .5, or 2.25). Step- wise multiple linear regression was used to 
determine which items best predicted the final rankings (possible range =1 to 5). When the 
rankings were entered as the dependent variables in the regression analysis, and all 14 of the 
interview items, GPA. MAT scores, principals' recomnaendations scores, and writing sample 
scores were employed as independent variables (SPSS Base System, 1992), only one item 
(Application of Generalizations- A) emerged as a significant predictor of ranking, E=.46, 
p= .019. That is, the higher the score on this item, the better the final ranking. 

Of the four variables which were not part of the interview, principals' 
recommendation had the highest correlation with ranking (B=.462) while writing sample 
correlated at a less significant level (R=.297), and undergraduate grade point average aud 
MAT scores did not correlate significantly with ranking. 

Given the expectancy that many of the interview variables would be inter :orrelated 
and thus share common variance in predicting ranking, simple correlations were examined to 
determine the degree to which each dependent variable correlated with final ranking. Table 1 
summarized the intercorrelations between the Urban Teacher Selection Interview items; Table 
2 summarizes the Pearson product- moment correlations between the individual predictor 
variables and final ranking. 
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Insert Tables 1 and 2 about here 



As shown in Table 1, only two sets of items had high intercorrelations: Persistence-B 
with Response to Authority-A (.717) and Persistence-A with Persistence-B (.665). 
Moderately high correlations were found between Personal vs. Professional Orientation-A and 
Personal vs. Professional Orientation-B (.631), Approach to At-Risk Students- A and 
Approach to At-Risk Students-B (.628), Bumout-B and Fallibility-B (.619), and Persistence-A 
and Response to Authority-A (.606). Nineteen other sets of items were moderately 
correlated, ranging from .370 to .585. Generally, the "A" questions correlated more highly 
with "B" questions (17 pairs) than with different questions. Specifically, for all seven of the 
Mid-Range Functions the "A" question correlated with its counterpart "B" question at a 
moderate to moderately high level (range = .370 to .665). Ei^^ht of the pairs had negative 
correlations, though none of these was significant. The lowest r among the items was .006 
for Application of Generalizations-A with Personal vs. Professional Orientation-A. 

Eight of the fourteen items correlated moderately and significantly with final ranking 
(see Table 2). The highest correlation was between ranking and Application of 
Generalizations-A (r=496). The others, in order of the strength of their conelations, were 
Approach to At-Risk Students-B (E=.439), Approach to At-Risk Students-A (£=.425), 
Bumout-B (i=.417), Response to Authority-B (i-^ .376), Bumout-A (£==.358), Persistence-B 
(£z=.357), and Fallibility-B (£=.306). Generally the questions on section "B*' of each Mid- 
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Range Function correlated higher with final ranking than did the "A" questions. The lowest 
correlation of any of the items with final ranking was Personal vs. Professional Orientation-A 
{r=.045). 

Conclusions and Discussion 

The purpose of this study was to determine the consistency and predictability of items 
composing the Urban Teacher Selection Interview. In regressing final ranking on the 14 
interview items and other variables, only one variable. Application of Generalizations, was 
selected for entry into the regression equation (r= .496), This item asks candidates to state 
some principle they believe is true about education. However, other variables, such as 
Principal's Recommendations (r-=.462), Bumout-B (r=.306), and Approach to At-Risk 
Students-A (r=.425), were almoit as strong in predicting final ranking, but did not account 
for sufficient unique variance to be entered. 

Application to Generalizations-A thus emerged as the most discriminating of the 
individual items. When asked the associated interview question, candidates often responded 
with "I believe all children can learn" or "I believe learning can be fun and relevant to daily 
lives." These responses, as well as the responses to other strong predictors such as Burnout, 
may have been a function of the amount of time a candidate had been in a classroom/teaching 
situation. That is, it would probably be difficult for younger or less experienced applicants to 
a teacher education program to generalize about education in a convincing and reasonable 
way. The influences of the candidate's response on the final decision might also be a 
function of the philosophies of the interviewers^-if the candidate selects a generalization about 
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education with which the interviewer agrees, the interviewer would value that candidate more 
highly. 

The correlational results indicated that 8 of the 14 items on the interview correlated 
significantly with the final ranking, although the final ranking on the interview is not derived 
directly from the item scores. Perhaps more interesting, however, is the significant and 
relatively high correlation between final ranking and principals' recommendations. Both the 
rankings and the principafs recommendations were conducted independentiy and interviewers 
were not aware of die candidates' scores on either. While the principals' recommendation- 
rankings correlations is not overly strong, it still suggests that principals, university faculty, 
and trained interviewers tended to perceive the candidates in a similar manner. Seemingly, 
they all look for the same qualities or they have all be trained in various educational 
institutions to distinguish certain qualities that will be acceptable to the education system. It 
should also be noted that the traditional methods of selecting teacher education candidates for 
graduate studies, undergraduate GPA (e=.255), MAT scores (r=.064), and writing sample 
(r = .297) correlated very weakly with ranking. Educators apparentiy look for characteristics 
other than academic performance when selecting potential teachers. 

The findings indicated that six of the items on the Urban Teacher Selection Interview 
did not correlate with fmal ranking (see Table 2). Five of those six, though, were highly 
correlated with other items witiiin tiie interview (see Table 1). The fmal ratings, then, seem 
to be based on a holistic judgment of the interviewer that is only weakly related to the item 
responses. Also, that some of the items correlated highly with one another, raises die 
question of whether such a long interview is really needed? The "A" questions, for example, 
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tended to overlap highly with their counterpart "B" questions. Both questions, therefore, may 
not be needed to make a final judgment. 

It should be noted that the present study used a small sample. Follow-up research 
with a larger n would certainly be required before any of the items were eliminated from the 
interview. Other questions which could be addressed in subsequent research are: How do 
the rankings on this instrument relate to correlations based on industrial interview procedures 
and to job performance measures (Sailor, 1984)? Does the Haberman interview correlate 
with other measures of personality characteristics (Mickler and Solomon, 1986)? Are 
expected responses on the Haberman instrument beyond the experience of those who have not 
yet been in the classroom (Leeds, 1969)? Most importantly, how well does performance on 
this interview predict success in teaching? The latter question will be the focus of subsequent 
research conducted as the present sample and later cohorts complete the alternative licensure 
program and adopt teaching positions in the schools. 
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Table 2 

Pearson Correlations Between Predictor Variables and Final Ranking 



Correlation 

Dependent Variable with Ranking p 



Persistence-A 


.240 


.089 


it'ersistence-B 


.357 


.021 


Response to Authority-A 


.145 


.210 


Response to Authority-B 


.376 


.015 


Application of Generalizations-A 


.496 


.002 


Application of Generalizations-B 


.237 


.092 


Approach to At-Risk Studenls-A 


.425 


.007 


Approach to At-Risk Students-B 


.439 


.005 


Personal vs. Professional Orientation-A 


.045 


.403 


Personal vs. Professional Orientation-B 


.066 


.359 


Burnout-A 


.358 


.020 


5urnout-B 


.417 


.008 


FaUibiUty-A 


.236 


.093 


FallibiUty-B 


.306 


.042 


GPA 


.255 


.100 


MAT Scores 


.064 


.364 


Principals' Recommendations 


.462 


.008 


Writing Sample 


.297 


.047 
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